[peft
] If AutoModel is wrapped with PEFT for prompt learning, then extend the attention mask
#3000
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #2995, resolves huggingface/peft#2154
Hello!
Pull Request overview
Details
Sentence Transformer models are sometimes trained with the
AutoModel
wrapped in PEFT, as that can lead to decreased computation cost while training. In particular, when PEFT with prompt learning is used, then virtual tokens (or rather, just input_embeds) are added to the model, and theattention_mask
is updated before the baseAutoModel
is called.However, then the attention mask used in the Pooling module won't be updated. This PR fixes that.
Concern
My primary concern now is that the model doesn't seem to be able to train well:
Regardless of whether I use mean or CLS pooling.
@BenjaminBossan could you 1) verify that the PR diff looks solid at a glance and 2) let me know if a model with this config is supposed to train roughly as well as with "full" training?